Skip to content

feat(quantized): VNNI INT8 GEMM via VPDPBUSD (sprint W3-C)#128

Merged
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3C-vnni-gemm
Apr 30, 2026
Merged

feat(quantized): VNNI INT8 GEMM via VPDPBUSD (sprint W3-C)#128
AdaWorldAPI merged 1 commit into
masterfrom
claude/burn-W3C-vnni-gemm

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Closes parity item (12) — INT8 GEMM accelerated via AVX-512 VNNI's VPDPBUSD instruction (4-element u8×i8→i32 dot product, 64 multiply-accumulates per zmm). Falls back to scalar int8_gemm_i32 on hardware without VNNI.

What ships:

  • src/hpc/vnni_gemm.rs (387 LOC): int8_gemm_vnni public API, _mm512_dpbusd_epi32 inner kernel, scalar fallback path
  • src/hpc/simd_caps.rs: avx512vnni: bool field added to SimdCaps, is_x86_feature_detected!("avx512vnni") detection wired
  • src/hpc/mod.rs: pub mod vnni_gemm declaration

Hardware coverage: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake. Fallback works on any x86_64 / ARM / scalar.

Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values).

Note: type-cast fix applied to _mm512_loadu_si512 / _mm512_storeu_si512 (*const i32*const __m512i, *mut i32*mut __m512i) per Rust 1.94 intrinsic signatures.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj

Closes parity item 12 — INT8 GEMM accelerated via AVX-512 VNNI's VPDPBUSD
instruction (4-element u8×i8→i32 dot product). Falls back to scalar
int8_gemm_i32 on hardware without VNNI.

What ships:
- src/hpc/vnni_gemm.rs (387 LOC): int8_gemm_vnni public API,
  has_vnni() detection, _mm512_dpbusd_epi32 inner kernel, scalar fallback
- src/hpc/simd_caps.rs: avx512vnni: bool field added to SimdCaps,
  is_x86_feature_detected!("avx512vnni") detection wired
- src/hpc/mod.rs: pub mod vnni_gemm declaration

Hardware coverage:
- AVX-512 VNNI: Ice Lake, Sapphire Rapids, Zen 4 (with AVX-512), Tiger Lake
- Fallback: any x86_64 / ARM / scalar

Tests: 11 passing (4×4, 16×16, 17×17 tail, 1×1 edge, mixed values).
Total lib tests: 1817+ pass.

Note: type-cast fix applied to _mm512_loadu_si512 / _mm512_storeu_si512
(*const i32 → *const __m512i, *mut i32 → *mut __m512i) per Rust 1.94
intrinsic signatures.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
@AdaWorldAPI AdaWorldAPI merged commit 0d22e44 into master Apr 30, 2026
4 of 10 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e24f7aa8b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hpc/vnni_gemm.rs
Comment on lines +61 to +62
if caps.has_avx512_vnni() {
unsafe { int8_gemm_vnni_avx512(a, b, c, m, n, k) }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate VNNI kernel on all declared target features

int8_gemm_vnni_avx512 is compiled with #[target_feature(enable = "avx512f,avx512vnni,avx512bw")], but dispatch only checks has_avx512_vnni() (avx512f && avx512vnni). If this runs on a CPU/VM that exposes VNNI without avx512bw, the unsafe call enters a function compiled for an unsupported feature set, which is undefined behavior and can surface as SIGILL. The runtime gate should include avx512bw (or the function should stop declaring it if truly unnecessary).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants